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Abstract 

This work is concerned with approximating constraint satisfaction problems 
(CSPs) with an additional global cardinality constraints. For example, Max Cut 
is a boolean CSP where the input is a graph G = ( V, E) and the goal is to find a 
cut S U S = V that maximizes the number of crossing edges, |2?(5,5)|. The Max 
Bisection problem is a variant of Max Cut with an additional global constraint 
that each side of the cut has exactly half the vertices, i.e., \S\ = \V\/2. Several 
other natural optimization problems like Min Bisection and approximating Graph 
Expansion can be formulated as CSPs with global constraints. 

In this work, we formulate a general approach towards approximating CSPs 
with global constraints using SDP hierarchies. To demonstrate the approach we 
present the following results: 

- Using the Lasserre hierarchy, we present an algorithm that runs in time 
0(fiP°m r/s)) given an instance of Max Bisection with value 1 — e, finds 
a bisection with value 1 — 0( ~\[s). This approximation is near-optimal (up to 
constant factors in 0()) under the Unique Games Conjecture. 

- By a computer-assisted proof, we show that the same algorithm also achieves 
a 0.85-approximation for Max Bisection, improving on the previous bound 
of 0.70 (note that it is Unique Games hard to approximate better than a 0.878 
factor). The same algorithm also yields a 0.92-approximation for Max 2-Sat 
with cardinality constraints. 

- For every CSP with a global cardinality constraints, we present a generic 
conversion from integrality gap instances for the Lasserre hierarchy to a dic- 
tatorship test whose soundness is at most integrality gap. Dictatorship testing 
gadgets are central to hardness results for CSPs, and a generic conversion of 
the above nature lies at the core of the tight Unique Games based hardness 
result for CSPs. [Rag08] 
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1 Introduction 



Constraint Satisfaction Problems (CSP) are a class of fundamental optimization prob- 
lems that have been extensively studied in approximation algorithms and hardness of 
approximation. In a constraint satisfaction problem, the input consists of a set of vari- 
ables taking values over a fixed finite domain (say {0, 1}) and a set of local constraints 
on them. The constraints are local in that each of them depends on at most k variables 
for some fixed constant k. The goal is to find an assignment to the variables that satisfies 
the maximum number of constraints. 

Over the last two decades, there has been much progress in understanding the ap- 
proximability of CSPs. On the algorithmic front, semidefinite programming (SDP) has 
been used with great success in approximating several well-known CSPs such as Max 
Cut [GW95], Max 2-Sat [CMM07] and Max 3-Sat [KZ97]. More recently, these algo- 
rithmic results have been unified and generalized to the entire class of constraint satisfac- 
tion problems [RS09a]. With the development of PCPs and long code based reductions, 
tight hardness results matching the SDP based algorithms have been shown for some 
CSPs such as Max-3-SAT [HOI]. In a surprising development under the Unique Games 
Conjecture, semidefinite programming based algorithms have been shown to be opti- 
mal for Max Cut [KKMO07], Max 2-Sat [Aus07] and more generally every constraint 
satisfaction problem [Rag08]. 

Unfortunately, neither SDP based algorithms nor the hardness results extend satisfac- 
torily to optimization problems with non-local constraints. Part of the reason is that the 
nice framework of SDP based approximation algorithms and matching hardness results 
crucially rely on the locality of the constraints involved. Perhaps the simplest non-local 
constraint would be to restrict the cardinality of the assignment, i.e., the number of ones 
in the assignment. Variants of CSPs with even a single cardinality constraint are not 
well-understood. Optimization problems of this nature, namely constraint satisfaction 
problems with global cardinality constraints are the primary focus of this work. Several 
important problems such as Max Bisection, Min Bisection, Small-Set Expansion can be 
formulated as CSPs with a single global cardinality constraint. 

As an illustrative example, let us consider the Max Bisection problem which is also 
part of the focus of this work. The Max Bisection problem is a variant of the much well- 
studied Max Cut problem [GW95, KKMO07]. In the Max Cut problem the goal is to 
partition the vertices of the input graph in to two sets while maximizing the number of 
crossing edges. The Max Bisection problem includes an additional cardinality constraint 
that both sides of the partition have exactly half the vertices of the graph. The seemingly 
mild cardinality constraint appears to change the nature of the problem. While Max Cut 
admits a factor 0.878 approximation algorithm [GW95], the best known approxima- 
tion factor for Max Bisection equals 0.7027 [FL06], improving on previous bounds of 
0.6514 [FJ97], 0.699 [YeOl], and 0.7016 [HZ02]. These algorithms proceed by rounding 
the natural semidefinite programming relaxation analogous to the Goemans-Williamson 
SDP for Max Cut. Guruswami et al. [GMR + 11] showed that this natural SDP relax- 
ation has a large integrality gap: the SDP optimum could be 1 whereas every bisection 
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might only cut less than 0.95 fraction of the edges! In particular, this implies that none 
of these algorithms guarantee a solution with value close to 1 even if there exists a per- 
fect bisection in the graph. More recently, using a combination of graph-decomposition, 
bruteforce enumeration and SDP rounding, Guruswami et al. [GMR + 11] obtained an 
algorithm that outputs a 1 - 0{e 1 ^ log(l/e)) bisection on a graph that has a bisection of 
value 1 - s. 

A simple approximation preserving reduction from Max Cut shows that Max Bi- 
section is no easier to approximate than Max Cut (the reduction is simply to take two 
disjoint copies of the Max Cut instance). Therefore, the factor 16/17 NP-hardness 
[HOI, TSSW00] and the factor 0.878 Unique-Games hardness for Max Cut [KKMO07] 
also applies to the Max Bisection problem. In fact, a stronger hardness result of factor 
15/16 was shown in [HK04] assuming NP £ f| 7 >o TIME(2" r ). Yet, these hardness re- 
sults for Max Bisection are far from matching the best known approximation algorithm 
that only achieves a 0.702 factor. 

SDP Hierarchies. Almost all known approximation algorithms for constraint satis- 
faction problems are based on a fairly minimal SDP relaxation of the problem. In 
fact, there exists a simple semidefinite program with linear number of constraints (see 
[Rag08, RS09a]) that yields the best known approximation ratio for every CSP This 
leaves open the possibility that stronger SDP relaxations such as those obtained using the 
Lovasz-Schriver, Sherali-Adams and Lasserre SDP hierarchies yield better approxima- 
tions for CSPs. Unfortunately, there is evidence suggesting that the stronger SDP relax- 
ations yield no better approximation for CSPs than the simple semidefinite program sug- 
gested in [Rag08, RS09a]. First, under the Unique Games Conjecture, it is NP-hard to 
approximate any CSP to a factor better than that yielded by the simple semidefinite pro- 
gram [Rag08]. Moreover, a few recent works [KS09, Tul09, RS09b] have constructed in- 
tegrality gap instances for strong SDP relaxations of CSPs, obtained via Sherali-Adams 
and Lasserre hierarchies. For instance, the integrality gap instances in [KS09, RS09b] 
demonstrate that up to (log log rif rounds of the Sherali-Adams SDP hierarchy yields 
no better approximation to Max Cut than the simple Goemans-Williamson semidefinite 
program [GW95]. 

The situation for CSPs with cardinality constraints promises to be different. For the 
Balanced Separator problem - a CSP with a global cardinality constraint, Arora et al. 
[ARV04] obtained an improved approximation of -^/log n by appealing to a stronger SDP 
relaxation with triangle inequalities. In case of Max Bisection, one of the components 
of the algorithm of [GMR + 11] is a brute-force search - a technique that could quite 
possibly be carried out using SDP hierarchies. 

Despite their promise, there are only a handful of applications of SDP hierarchies in 
to approximation algorithms, most notably to approximating graph expansion [ARV04], 
graph coloring and hypergraph independent sets. Moreover, there are few general tech- 
niques to round solutions to SDP hierarchies, and analyze their integrality gap. 

In an exciting development, fairly general techniques to round solutions to SDP hi- 
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erarchies (particularly the Lasserre hierarchy) has emerged in recent works by Barak et 
al. [BRS11] and Guruswami and Sinop [GS11]. Both these works (concurrently and 
independently) developed a fairly general approach to round solutions to the Lasserre 
hierarchy using an appropriate notion of local-global correlations in the SDP solution. 
As an application of the technique, both the works obtain a subexponential time algo- 
rithm for the Unique Games problem using the Lasserre SDP hierarchy. These works 
also demonstrate several interesting applications of the technique. 

Barak et al. [BRS1 1] obtain an algorithm for arbitrary 2-CSPs with an approxima- 
tion guarantee depending on the spectrum of the input graph. Specifically, the result 
implies a quasi-polynomial time approximation scheme for every 2-CSP on low thresh- 
old rank graphs, namely graphs with few large eigenvalues. 

Guruswami and Sinop [GS11] obtain a general algorithm to optimize quadratic in- 
teger programs with positive semidefinite forms and global linear constraints. Several 
interesting problems including 2-CSPs with global cardinality constraints such as Max 
Bisection, Min Bisection and Balanced Separator fall in to the framework of [GS11]. 
However, the approximation guarantee of their algorithm depends on the spectrum of 
the input graph, and is therefore effective only on the special class of low threshold rank 
graphs. 

1.1 Our Results 

In this paper, we develop a general approach to approximate CSPs with global cardinal- 
ity constraints using the Lasserre SDP hierarchy. 

We illustrate the approach with an improved approximation algorithm for the Max 
Bisection and balanced Max 2-Sat problems. For the Max Bisection problem, we show 
the following result. 

Theorem 1.1. For every 6 > 0, there exists an algorithm for Max Bisection that runs in 
time 0(n poly(l ^^) and obtains the following approximation guarantees, 

— The output bisection has value at least 0.85 —6 times the optimal max bisection. 

— For every e > 0, given an instance G with a bisection of value 1 - s, the algorithm 
outputs a bisection of value at least 1 - 0( ^/s) - 6. 

Note that the approximation guarantee of 1 - 0( VS) on instances with 1 - s is nearly 
optimal (up to constant factors in the £?()) under the Unique Games Conjecture. This 
follows from the corresponding hardness of Max Cut and the reduction from Max Cut 
to Max Bisection. 

Our approach is robust in that it also yields similar approximation guarantees to the 
more general q--Max Cut problem where the goal is to find a cut with exactly a-fraction 
of vertices on one side of the cut. More generally, the algorithm also generalizes to a 
weighted version of Max Bisection, where the vertices have weights and the cut has 
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approximately half the weight on each side. 

The same algorithm also yields an approximation to the complementary problem 
of Min Bisection. Formally, we obtain the following approximation algorithm for Min 
Bisection and ^-Balanced Separator. 

Theorem 1.2. For every 5 > 0, there exists an algorithm running in time 0(n ^ v °^ x l s ^), 
which given a graph with a bisection ( a-balanced separator) cutting s-fraction of the 
edges, finds a bisection ( a-balanced separator) cutting at most 0( V?) + 6-fraction of 
edges. 

Towards showing a matching hardness results for CSPs with cardinality constraints, 
we construct a dictatorship test for these problems. Dictatorship testing gadgets lie at the 
heart of all optimal hardness of approximation results for CSPs (both NP-hardness and 
unique games based hardness results). In fact, using techniques from the work of Khot 
et al. [KKMO07], any dictatorship test for a CSP yields a corresponding unique games 
based hardness result. More generally, a large fraction of hardness of approximation 
results (not necessarily CSPs) have an underlying dictatorship testing gadget. 

Building on earlier works, Raghavendra [Rag08] exhibited a generic reduction that 
starts with an arbitrary integrality gap instance for certain SDP relaxation of a CSP 
to a dictatorship test for the same CSP. In turn, this implied optimal hardness results 
matching the integrality gap of the SDP under the unique games conjecture. Using 
techniques from [Rag08], we exhibit a generic reduction from integrality gap instances 
to the Lasserre SDP relaxation of a CSP with cardinality constraints, to a dictatorship 
test for the same. While the reduction applies in general for every CSP with cardinality 
constraints, for the sake of exposition, we present the special case of Max Bisection. 
For Max Bisection, we show the following. 

Theorem 1.3. (Informal Statement) For every s, 5 > 0, given an integrality gap instance 
for poly(\ / s)-round Lasserre SDP for Max Bisection, with SDP value c and optimum 
integral value s, there exists a dictatorship test for Max Bisection with completeness 
c - 0(e + 5) and soundness s + 0(e + 5). 

The formal statement of the result and its proof is presented in Section 6. Unfortu- 
nately, this dictatorship test does not yet translate in to a corresponding hardness result 
for Max Bisection. First, observe that the framework of Khot et al. [KKMO07] to show 
unique games based hardness results does not apply to Max Bisection due to the global 
constraint on the instance. This is the same reason why the unique games conjecture 
is not known to imply hardness results for Balanced Separator. The reason being that 
the hard instances of these problems are required to have certain global structure (such 
as expansion in case of Balanced Separator). In case of Max Bisection, a hard in- 
stance must not decompose in to sets of small size {en vertices), else the global balance 
condition can be easily satisfied by appropriately flipping the cut in each set indepen- 
dently. Gadget reductions from a unique games instance preserve the global properties 

1 Note that in the weighted case, finding any exact bisection is at least as hard as subset-sum problem. 
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of the unique games instance such as lack of expansion. Therefore, showing hardness 
for Balanced Separator or Max Bisection problems require a stronger assumption such 
as unique games with expansion or the Small Set expansion hypothesis [RS10]. 

2 Overview of Techniques 

In this section, we outline the our approach to approixmating the Max Bisection prob- 
lem. The techniques are fairly general and can be applied to other CSPs with global 
cardinality constraints. 

Global Correlation. For the sake of exposition, let us recall the Goemans and 
Williamson algorithm for Max Cut. Given a graph G = (V, E), the Goemans-Williamson 
SDP relaxation for Max Cut assigns a unit vector Vj for every vertex i e V, so as to 
maximize the average squared length £,-j € £||Ui - vj\\ 2 of the edges. Formally, the SDP 
relaxation is given by, 

maximize E \\vi - v;\\ 2 subject to \\v\L = 1 Vi e V 

i,j€E " ' 

The rounding scheme picks a random halfspace passing through the origin and outputs 
the partition of the vertices induced by the halfspace. The value of the cut returned is 
guaranteed to be within a 0.878-factor of the SDP value. 

The same algorithm would be an approximation for Max Bisection if the cut re- 
turned by the algorithm was near-balanced, i.e., \S\ ~ \V\/2. Indeed, the expected num- 
ber of vertices on either side of the partition is \V\/2, since each vertex i e V falls on a 
given side of a random halfspace with probability 5. 

If the balance of the partition returned is concentrated around its expectation then 
the Goemans and Williamson algorithm would yield a 0.878-approximation for Max Bi- 
section. However, the balance of the partition need not be concentrated, simply because 
the values taken by vertices could be highly correlated with each other! 

SDP Relaxation. To exploit the correlations between the vertices we use a &-round 
Lasserre SDP [LasOl] of Max Bisection for a sufficiently large constant k. On a high 
level, the solutions to a Lasserre's SDP hierarchy are vectors that locally behave like a 
distribution over integral solutions. The fc-round Lasserre SDP has the following proper- 
ties similar to a true distribution over integral solutions. 

- Marginal Distributions For any subset S of vertices with \S\ < k, the SDP will 
yield a distribution us on partial assignments to the vertices ({-1,1} S ). The 
marginals of [ij for a pair of subsets 5 and T are consistent on their inter- 
section 5 n T. 
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- Conditioning Analogous to a true distribution over integral solutions, for any sub- 
set S c V with \S\ < k and a partial assignment a e {-1, 1} S , the SDP solution 
can be conditioned on the event that 5 is assigned a. 

A detailed description of the Lasserre's SDP hierarchy for Max Bisection and other 
CSPs will be given in Section 3. 

Measuring Correlations. In this work, we will use mutual information as a measure 
of correlation between two random variables. We refer the reader to Section 3 for the 
definitions of Shannon entropy and mutual information. The correlation between ver- 
tices i and j is given by 

I m .(X i ;X j ) = H(X i )-H(X i \X j ), 

where the random variables X,-, Xj are sampled using the local distribution associated 
with the Lasserre SDP. An SDP solution will be termed a-independent if the average mu- 
tual information between random pairs of vertices is at most a, i.e., E, j e y[/(X ;Z ; )] < a. 

For most natural rounding schemes such as the halfspace-rounding, the variance of 
the balance of the cut returned is directly related to the average correlation between 
random pairs of vertices in the graph. In other words, if the rounding scheme is applied 
to an a-independent SDP solution then the variance of the balance of the cut is at most 
polyipc). 

Obtaining Uncorrelated SDP Solutions. Intuitively, if it is the case that globally all 
the vertices are highly correlated, then conditioning on the value of a vertex should 
reveal information about the remaining vertices, therefore reducing the total entropy of 
all the vertices. 

Formally, let us suppose the fc-round Lasserre SDP solution is not a-independent, 
i.e., Hjj e v[I(Xi;Xj)] > a. Let us pick a vertex i € V at random, sample its value 
b e {-1,1} and condition the SDP solution to the event X, = b. This conditioning 
reduces the average entropy of the vertices (Ej e y[H(Xj)]) by at least a in expectation. If 
the conditioned SDP solution is a-independent we are done, else we repeat the process. 

The intital average entropy Mj € y[H(Xj)] is at most 1, and the quantity always re- 
mains non-negative. Therefore, within A conditionings, the SDP solution will be a- 
independent. Starting with a &-round Lasserre SDP solution, this process produces a 
k— t round a-independent Lasserre SDP solution for some t > -. 

Rounding Uncorrelated SDP Solutions. Given an a-independent SDP solution, for 
many natural rounding schemes the balance of the output cut is concentrated around its 
expectation. Hence it suffices to construct rounding schemes that output a balanced cut 
in expectation. We exhibit a simple rounding scheme that preserves the bias of each 
vertex individually, thereby preserving the global balance property. The details of the 
rounding algorithm will be described in Section 5. 
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3 Preliminaries 



Constraint Satisfaction Problem with Global Cardinality Constraints. In this sec- 
tion we formally define CSPs with global constraints. 

Definition 3.1 (Constraint Satisfaction Problems with Global Cardinality Constraints). 
A constraint satisfaction problem with global cardinality constraints is specified by A = 
([q], P, k, c) where [q] = {0, . . . , q - 1} is a finite domain, P = {P : [qf ^ [0, \]\t < k) is 
a set of payoff functions. The maximum number of inputs to a payoff function is denoted 
by k. The map c : [q] i-> [0, 1] is the cardinality function which satisfies £,c ; - = 1. For 
any < i < q - 1, the solution should contain c, fraction of the variables with value i. 

Remark 3.2. Although some problems (e.g., Balanced Separator) do not fix the cardi- 
nalities to be some specific quantities, they can be easily reduced to the above case. 

Definition 3.3. An instance O of constraint satisfaction problems with global cardinality 
constraints A = ([q], P, k, c) is given by CD = (*V, P<y, W) where 

- *V = {x\, . . . , x n }: variables taking values over [q] 

- P<y consists of the payoffs applied to subsets S of size at most k 

- Nonnegative weights W - {w$} satisfying H|S|<& w s - 1- Thus we may interpret 
Was a probability distribution on the subsets. By S ~ W, we denote a set 5 
chosen according to the probability distribution W 

- An assignment should satisfy that the number of variables with value i is c,n (we 
may assume this is an integer). 

Here we give a few examples of CSPs with global cardinality constraints. 

Definition 3.4 (Max(Min) Bisection). Given a (weighted) graph G = (V,E) with |V| 
even, the goal is to partition the vertices into two equal pieces such that the number 
(total weights) of edges that cross the cut is maximized (minimized). 

More generally, in an q--Max Cut problem, the goal is to find a partition having 
an vertices on one side, while cutting the maximum number of edges. Furthermore, 
one could allow weights on the vertices of the graph, and look for cuts with exactly 
Qf-fraction of the weight on one side. Most of our techniques generalize to this setting. 

Throughout this work, we will have a weighted graph G with weights W on the 
vertices. The weights on the vertices are assumed to form a probability distribution. 
Hence the notation i ~ W refers to a random vertex sampled from the distribution W. 

Definition 3.5 (Edge Expansion). Given a graph (w.l.o.g, we may assume it is a un- 
weighted regular graph) G = (V, E), and 5 e (0, 1/2), the goal is to find a set S c V such 
that \S | = 6\V\ and the edge expansion of S : 0(5) = E< *iV is minimized. 
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Information Theoretic Notions. 



Definition 3.6. Let X be a random variable taking values over [q]. The entropy of X is 
defined as 

H(X) d = - ^ P(X - log P(X = i) 

i<=[q] 

Definition 3.7. Let X and Y be two jointly distributed variables taking values over [q]. 
The mutual information of X and Y is defined as 



7(X;T) d = nX = i,Y = j)log 



P(X = i,Y = j) 



nx = i)nY = j) 

Definition 3.8. Let X and Y be two jointly distributed variables taking values over [q\. 
The conditional entropy of X conditioned on Y is defined as 

H(X\Y) = E [H(X\Y = 0] 

i€[q] 

We also give two well-known theorems in information theory below. 

Theorem 3.9. Let X and Y be two jointly distributed variables taking value on [q], then 

I(X; Y) = H{X) - H{X\Y) 

Theorem 3.10. (Data Processing Inequality) Let X, Y,Z, W be random variables such 
that H(X\W) = and H(Y\Z) = 0, i.e., X is fully determined by W and Y is fully 
determined by Z, then 

I(X;Y)<I(W;Z) 

Lasserre SDP Hierarchy for Globally Constrained CSPs. Let A = ([q], P, k, c) be 
a CSP with global constraints and O = ("V, P<y, W) be an instance of A on variables 
X = [x\, x n \. A solution to the &-round Lasserre SDP consists of vectors vs, a for all 
vertex sets S c V with \S \ < k and local assignments a e [q] s . Also for each subset 
5 c V with |5| < k, there is a distribution /u$ on [q] s . For two subsets S, T such that 
|5|, |T| < k, we require that the corresponding distributions ^5 and [ij are consistant 
when restricted to S n T. A Lasserre solution is feasible if for any \S U T\ < k, a e [q] s , 
fie [q] T , we have 

{vs, a ,VTj}) = ^msvt^Xs = a,X T =j3) 

The SDP also has a vector I that denotes the constant 1. The global cardinality con- 
straints can be written in terms of the marginals of each variable. Specifically, for every 
5 with \S I < k - 1 and a e [q] s , we have 

The objective of the SDP is to maximize 
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While the complete description of the Lasserre SDP hierarchy is somewhat compli- 
cated, there are few properties of the hierarchy that we need. The most important prop- 
erty is the existence of consistent local marginal distributions {yUs}scy,|s|<jt whose first 
two moments match the inner products of the vectors. We stress that even though the 
local distributions are consistent, there might not exist a global distribution that agrees 
with all of them. The second property of the fc-round Lasserre SDP solution is that 
although the variables are not jointly distributed, one can still condition on the assign- 
ment to any given variable to obtain a solution to the k - 1 round Lasserre's SDP that 
corresponds to the conditioned distribution. 

4 Globally Uncorrected SDP Solutions 

As remarked earlier, it is easy to round SDP solutions to a CSP with cardinality con- 
straint if the variables behave like independent random variables. In this section, we 
show a very simple procedure that starts with a solution to the (k + Z)-round Lasserre 
SDP and produces a solution to the /-round Lasserre SDP with the additional property 
that globally the variables are somewhat "uncorrelated". To this end, we define the no- 
tion of a-independence for SDP solutions below. We remark that all the definitions and 
results in this section can be applied to all CSPs. 

Definition 4.1. Given a solution to the &-round Lasserre SDP relaxation, it is said to be 
a-independent if E (J -^^[/ Wi AX^Xj)] < a where is the local distribution associated 
with the pair of vertices [i, j}. 

Remark 4.2. We stress again that the variables in the SDP solution are not jointly dis- 
tributed. However, the notion is still well-defined here because of the locality of mutual 
information: it only depends on the joint distribution of two variables, which is guar- 
anteed to exist by the SDP. Also, fiyj} in the expression can be replaced with [is for 
arbitrary S with i, j e S and \S | < k because of the consistency of local distributions. 

The notion of a-independence of random variables using mutual information, easily 
translates in to more familiar notion of statistical distance. Specifically, we have the 
following relation. 

Fact 4.3. Let X and Y be two jointly distributed random variables on [q] then, 

I(X; Y) > ^ £ (P(X = i,Y = j) - P(X = i)P(Y = j)) 2 , 
i,Mq\ 

in particular for all i, j € [q] 

\P(X = i,Y = j) - P(X = i)P(Y = j)\ < y/2I(X;Y) 
As a consequence, ifX and Y are two random variables defined on {-1,1}, Cov(X, Y) < 

O(VT^T)) 
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For the sake of completeness, we include the proof of this observation in 
Appendix B. Now we describe the procedure of getting an a-independent /-rounds 
Lasserre's solution. A similar argument was concurrently discovered in [BRS1 1]. Here 
we reproduce the argument in information theoretic terms, while [BRS1 1] present the ar- 
gument in terms of covariance. The information theoretic argument is somewhat robust 
and cleaner in that it is independent of the sample space involved. 



Algorithm 4.4. Input: A feasible solution to the (k + I) round Lasserre SDP relaxation 
as described in Section 3 for k - 1/ yfa. 

Output: An a-independent solution to the / round Lasserre SDP relaxation. 

Sample indices i\, , . . , c V independently according to W. Set t = 1. 
Until the SDP solution is a-independent repeat 

- Sample the variable X, ( from its marginal distribution after the first t - 1 fixings, 
and condition the SDP solution on the outcome. 

- t = t+l. 

The following lemma shows that there exists t such that the resulting solution is 
a-independent after £-conditionings with high probability. 

Lemma 4.5. There exists t < k such that E^...^^ ^ij~wU(Xi, Xj\Xj l , . . . , < 
Proof. By linearity of expectation, we have that for any t < k - 2 

E [H(Xi\X h ,...,X it )] = E [H (Xi\X h X u )] - E E [I(X U X it \X h ,..., X u )] 

i,ii,...,it~W i,ii,...,i,~W h,...J,-\~W i,i,~W 

adding the equalities from t=ltot = k — 2 f we get 

B[H(Xd]-. E [H(X i \X h ,...,X ik _ 2 )]= V ... E [/(X^X^, . . . ,X u y\ 

l~W t\,...,ll-2~W *—* ,...,I;_1~W 

The lemma follows from the fact that for each i, H{Xj) < log q. □ 

Theorem 4.6. For every a > and positive integer I, there exists an algorithm running 
in time that finds an a-independent solution to the {-round Lasserre SDP, 

with an SDP objective value of at least OPT —a, where OPT denotes the optimum value 
of the {-round Lasserre SDP relaxation. 

Proof. Pick k = 41 °f q . Solve the k + i round Lasserre SDP solution, and use it as 
input to the conditioning algorithm described earlier. Notice that the algorithm respects 
the marginal distributions provided by the SDP while sampling the values to variables. 
Therefore, the expected objective value of the SDP solution after conditioning is exactly 
equal to the SDP objective value before conditioning. Also notice that the SDP value 
is at most 1. Therefore, the probability of the SDP value dropping by at least a due to 
conditioning is at most 1/(1 + a). 
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Also, by Lemma 4.5 and Markov Inequality, the probability of the algorithm failing 

to find a ^^ii -independent soluton is at most yj^f^-- Therefore, by union bound, 
there exists a fixing such that the SDP value is maintained up to a, and the solution 
after conditioning is a-independent. Moreover, this particular fixing can be found using 
brute-force search. 

□ 



5 Rounding Scheme for Max Bisection 

In this section, we present and analyze a natural rounding scheme for Max Bisection. 
Given an globally uncorrected SDP solution to a 2-round Lasserre SDP relaxation of 
Max Bisection, the rounding scheme will output a cut with the approximation guarantees 
outlined in Theorem 1.1. The same rounding scheme also yields a 0.92-approximation 
algorithm for arbitrary globally constrained Max 2-Sat problem. 

Constructing Goemans- Williamson type SDP solution. In the 2-round Lasserre 
SDP for Max Bisection, there are two orthogonal vectors v® and va for each variable 
Xj. This can be used to obtain a solution to the Goemans-Williamson SDP solution by 

def 

simply defining vt = v® - vn . The following proposition is an easy consequence, 

Proposition 5.1. Let Vj = v® - vn = (2pi - 1)1 + Wi where p\ = P(x,- - 0). Then, for 
each edge e - (z, j) € E, P Me (xi ± xj) = \\v,- - vj\\ 2 /4. 

Proof. 

Wn - Vj\\ 2 = 2- 2(v i0 - Viuvp -vji) = 2- 2(P fle (x i = xj) - P^te + xj)) = 4P 0e (xi + xf) 

□ 

def 

Let Wi be the component of Vj orthogonal to the / vector, i.e., Wj = (vj - (vj,I)I) . 
Using Vio + vn = I and (v®, v a > = 0, we get v i0 = (v iQ , 1)1 + Wi/2 and v a = (va , 1)1 - Wi/2. 
We remark that Wj is the crucial component that captures the correlation between xi and 
other variables. To formalize this, we show the following lemma. 

Lemma 5.2. Let V{ and vj be the unit vectors constructed above, Wi and Wj be the com- 
ponents of Vi and vj that orthogonal to I. Then \{wj, Wj)\ < 4 J2I(Xi, xj) 

def def 

Proof Let p t = P(x ; - = 0) = (v i0 ,I) and pj = P(xj - 0) = (vjoJ). Notice that 

\F( Xi - 0, Xj = 0)-P(jc,- = 0)P(xj = 0)1 = Wipil+m/^Pjl+Wj/D-piPjW - \(wi,Wj)\/4 

By applying Fact 4.3, we get \(w{, wj\ < 4 ^21{xi\xj □ 
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Henceforth we will switch from the alphabet {0, 1} to {-1,1} 2 . After this transfor- 
mation, we can interpret the inner product = (vj, I) = pi - (1 - pi) as the bias of vertex 
i. 

5.1 Rounding Scheme 

Roughly speaking, the algorithm applies a hyperplane rounding on the vectors Wi = 
Vi - (vj, 1)1 associated with the vertices i e V. However, for each vertex i e V, the 
algorithm shifts the hyperplane according to the bias of that vertex. 



Algorithm 5.3. Given: A set of unit vectors [vi, , . . , v n ) where v, = fitf + «;,-, where w\ 
is the component of orthogonal to /. 

Pick a random Gaussian vector g orthogonal to / with coordinates distributed as 
Af(0, 1). For every i, 

1. Project g on the direction of i% i.e., £ - (g, «;,), where w t = -^=L 2 is the 
normalized vector or Wj. Note that & is also a standard Gaussian variable. 

2. Pick threshold as follows: 

ti = o»-V/2 + i/2) 

3. If & < ti, set Xj = 1, otherwise set Xi - -1. 

Notice that, the threshold tj is chosen so that individually the bias of x, is exactly 
Hi. Therefore, the expected balance of the rounded solution matches the intended value. 
The analysis of the rounding algorithm consists of two parts: first we show that the cut 
returned by the rounding algorithm has high expected value, then we show the that the 
balance of the cut is concentrated around its expectation. 

5.2 Analysis of the Cut Value 

Analyzing the cut value of the rounding scheme is fairly standard albeit a bit technical. 
The analysis is local as in the case of other algorithms for CSPs, and reduces to bounding 
the probability that a given edge is cut. The probability that a given edge u, v is cut 
corresponds to a probability of an event related to two correlated Gaussians. 

By using numerical techniques, we were able to show that the cut value is at least 
0.85 times the SDP optimum. Analytically, we show the following asymptotic relation. 

Lemma 5.4. Let u - p\I + w\,v — p*il + u>2 be two unit vectors satisfying \\u — v\\ /4 < s, 
then the probability of them being separated by Algorithm 5.3 is at most 0( V^)- 

The proof of this lemma is fairly technical and is deferred to Appendix A. 
2 The mapping is given by — > 1 and 1 — > - 1 
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5.3 Analysis of the Balance 

In this section we show that the balance of the rounded solution will be highly concen- 
trated. We prove this fact by bounding the variance of the balance. Specifically, we 
show that if the SDP solution is or-independent, then the variance of the balance can be 
bounded above by a function of a. 

The proof in this section is information theoretical - although this approach gives 
sub-optimal bound, but the proof itself is very simple and clean. 

Lemma 5.5. Let Vj - pj + Wj and vj - pjl + wj be two vectors in the SDP solution 
that satisfy \(ujj, Wj)\ < £• Let «/,- and yj be the rounded solution of and Vj, then 
l( yi ;yj)<0{^ 3 ) 

Proof. Since 

|<h>/,ii>j>| - ^l-p 2 ^\-p)\(w u wj)\ < £ 

It implies that one of the three quantities in the equation above is at most £ 1,/3 . If it 
is the case that ^1 - /x? < £ l/3 or ^1 - p}. < £ 1/3 (w.l.o.g we can assume it's the first 
case), then we have 

min(|l-^-|,|l+^|)<0(^ /3 ) 

We may assume pi > 0, therefore 1 - /i; < 0(£ 2 ^ 3 ). Notice that our rounding scheme 
preserves the bias individually, which implies yi is a highly biased binary variable, hence 

Kjfuyj) < H(yt) = 0(-(l ~fn)\og(l -//,•)) < 0(^ 1/3 ) 

Now let's assume it's the case that \(wi,Wj)\ < £^ 3 . Let g\ = g ■ wi and g2 = g ■ W2 
as described in the rounding scheme, and p - («);, wj). Hence g\ and g2 are two jointly 

distributed standard Gaussian variables with covariance matrix £ = 
The mutual information of g\ and g2 is 

i(g\,gi) = -^iog(detx) < o(-io g (i -<r 2/3 )) < 0(^ 1/3 ) 

Notice that y\ is fully dependent on therefore by the data processing inequality 
(Theorem 3.10), we have I(yi,y 2 ) < I(g\,gi) < 0(^ 1/3 ) □ 

Theorem 5.6. Given an a-independent solution to 2-rounds Lasserre's SDP hierarchy. 
Let {yi} be the rounded solution after applying Algorithm 5.3. Define S = E,-~w yu then 

Var{S) < 0(a l/n ) 

Proof. 

Var(S) = .E .JCovQfuyj)] 

i,j~W 
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< E [0Ul(yf,yj))] (by Fact 4.3) 

< E [0( J\wi,Wj\ 1 ' 3 )] (by Lemma 5.5) 



< E^[<9( y/(jc ; -; xj) 1 ' 6 )] (by Lemma 5.2) 

< 0(( E [/(x,-; x ,-)]) 1/12 ) (by concavity of the function x 1/12 ) 

i,j~W ' 

< 0(a 1/12 ) 

□ 

Corollary 5.7. Given an a-independent solution to 2-rounds Lasserre 's SDP hierarchy 
Vj = Hi + Wj. The rounding algorithm will find an 0(a l ^ 24 )-balanced (that is, the balance 
of the cut differs from the expected value by at most 0(a 1 ^ 24 ) fraction of the total weights) 
with probability at least 1 - 0(a 1 ? 24 ). 

5.4 Wrapping Up 

Here we present the proofs of the main theorems of this work. 

Proof of Theorem 1.2. Suppose we're given a Min Bisection instance G = (V, E) 
with value at most e and constant 8 > 0. By setting a = 8 2A and applying Theorem 4.6, 
we will get an a-independent solution with value at most s + a. By Lemma 5.4 and the 
concavity of the function ^[x, the expected size of the cut returned by Algorithm 5.3 is at 
most 0( + a ) = 0(^Je+ sja). Therefore, with constant probability (say 1/2), the cut 
returned by the rounding algorithm has size at most 0( V«+ V^)- Also, by Corollary 5.7, 
the cut will be 0(<5)-balanced with probability at least 1 - 0(6). Therefore, by union 
bound, the algorithm will return an O(o)-balanced cut with value at most 0(^fe + ->Ja) 
with constant probability. Notice that this probability can be amplified to 1-e by running 
the algorithm 0(log( 1 Is)) times. Given such a cut, we can simply move 0(5) fraction 
of the vertices with least degree from the larger side to the smaller side to get an exact 
bisection - this process will increase the value of the cut by at most 0(8). Therefore, in 
this case, we get a bisection of value at most 0(^Je + + - 0(^Je + 8). Hence, 
the expected value of the bisection returned by the rounding algorithm is at most (1 - 
s)0( js + 8) + e = 0(^ + 8). 

Proof of Theorem 1.1. The proof is similar in the case of Max Bisection. The only 
difference is that we have to use the fact that the rounding scheme is balanced, i.e., 
P(F(v) + F(-v)) = 1. Hence, by Lemma 5.4, for any edge (u, v) with value 1 - s in the 
SDP solution, the algorithm separates them with probability at least 1 - 0( Ve). The rest 
of the proof is identical. 

Using a computer-assisted proof, we can show that the approximation ratio of this 
algorithm for Max Bisection is between 0.85 and 0.86. Thus further narrowing down the 
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gap between approximation and inapproximability of Max Bisection. Using the same 
algorithm, we obtain a 0.92-approximation for globally constrained Max 2-Sat. It is 
known that under the Unique Games Conjecture, Max 2-Sat is NP-Hard to approximate 
within 0.9401. 

6 Dictatorship Tests from Globally Uncorrelated SDP Solu- 
tions 

A dictatorship test DICT for the Max Bisection problem consists of a graph on the set 
of vertices {±1} R . By convention, the graph DICT is a weighted graph where the edge 
weights form a probability distribution (sum up to 1). We will write (z, z') € DICT to 
denote an edge sampled from the graph DICT (here z, z' e {±1}*). 

A cut of the DICT graph can be thought of as a boolean function !F : {±1} R — > {±1}. 
The value of a cut T given by 

DICTCF) = \ E [l - T{z)T{z'j\ , 

2 (z,z')eDICT L J 

is the probability that z,z' are on different sides of the cut. It is also useful to define 
DICT(!F) for non-boolean functions T : {+l) s — > [-1, 1] that take values in the interval 
[—1,1]. To this end, we will interpret a value 'F(z) € [-1, 1] as a random variable 
that takes {±1} values. Specifically, we think of a number a e [—1,1] as the following 
random variable 

(- 1 with probability 
1+ 
1 with probability 

With this interpretation, the natural definition of DICT(? r ) for such a function is as fol- 
lows: 

DICT(IF) = \ E f 1 - T{z)T{z')\ . 

2 (z,z')eDICT L J 

Indeed, the above expression is equal to the expected value of the cut obtained by ran- 
domly rounding the values of the function T : {±1} R — > [— 1, 1] to {±1} as described in 
Equation (6.1). 

We will construct a dictatorship test for the weighted version of Max Bisection. In 
particular, each vertex x e {±1} R of DICTis associated a weight W(x), and the weights 
W form a probability distribution over {+1} R (sum up to 1). The balance condition on 
the cut can now be expressed as E z ,.vp[!F(z)] = 0. 

The dictatorship test DICT can be easily transformed in to a dictatorship test DICT' 
for unweighted Max Bisection. The idea is to replace each vertex x e {±1} R with a 
cluster V x of L • M\ vertices for some large integer M. For every edge (x, y) in 
DICT, connect every pair of vertices in the corresponding clusters V x , V y with edge of 
the same weight. Given any bisection T' : DICT' — > {±1} of the graph DICT' with value 
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c, define T(z) - B ve y z T'(v). By slightly correcting the balance of T, it is easy to obtain 
a bisection T : {±1} S —* [-1, 1] satisfying 



DICTCF) > c - o M (l) 



Ef(z) = 0. 



z 



Conversely, given a bisection T : {±1} -» [-1, 1] of DICT, assign (1 + T(z))/2 fraction 
of vertices of V z to be 1 and the rest to -1. The resulting partition of DICT' is very close 
to balanced (up to rounding errors), and can be modified in to a bisection with value 
DICTCF) - o M {\). 

The dictator cuts are given by the functions f(z) = for some i e [R]. The 
dictatorship test graph is so constructed that each dictator cut will yield a bisection and 
the Completeness of the test DICT is the minimum value of a dictator cut, i.e., 



The soundness of the dictatorship test is the value of bisections of DICT that are far from 
every dictator. We will formalize the notion of being far from every dictator using the 
notion of influences. 

Influences and Noise Operators. To this end, we recall the definitions of influences 
and noise operators. Let Q = ({+!}, p) denote the probability space with atoms {±1} and 
a distribution p on them. Then, the influences and noise operators for functions over the 
product space £l R are defined as follows. 

Definition 6.1 (Influences). The influence of the £ th coordinate on a function T : 
\±\\ R -> R under a distribution p over {±1} is given by Inf^CF) = E^-n [ V x <o[T(x)]] = 



where each coordinate z w of 2 is equal to z® with probability 1 - e and a random element 
from fl with probability s. 

Invariance Principle. The following invariance principle is an immediate conse- 
quence of Theorem 3.6 in the work of Isaksson and Mossel [IM09]. 

Theorem 6.3. (Invariance Principle [IM09]) Let Q. be a finite probability space with the 
least non-zero probability of an atom at least a < 1/2. Let X, = be an ensemble 

of random variables over Q,. Let Q = {g\,g2\ be an ensemble of Gaussian random 
variables satisfying the following conditions: 



Completeness(DICT) = min DICT(z m ) 




T x . E T(z) - E[r(2) I z] 



B[€i\ - E[#] 



E[# = B\gf] 



nfitj] - E[^y] 



Vi,./e {1,2} 
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Let K — log(l/or). Let F denote a multilinear polynomial and let H — (T[- S F). Let 
the variance ofH, V[//] be bounded by 1 and all the influences are smaller than r, i.e., 
Inf i(H) < t for all i. 

If *F : R 2 — > R is a Lipschitz-continous function with Lipschitz constant Cq ( with 
respect to the L2 norm) then 

I E [¥(H(£ R ))] - E < C • Co • r e/18 * - o T (l) 

/or 5ome constant C. 

Construction. Let G = (V,E) be an arbitrary instance of Max Bisection. Let 
V = {vifl, Vij}iev denote a globally uncorrelated feasible SDP solution for two rounds 
of the Lasserre hierarchy. Specifically, for every pair of vertices i, j € V, there exists 
a distribution over {±1} assignments that match the SDP inner products. In other 
words, there exists {±1} valued random variables zuZj such that 

(Vi, vj) = Efe • zj] ■ 

Furthermore, the correlation between random pair of vertices is at most 5, i.e., 

E [I(Zi,Zj)]<6. 

i,jeV 

Starting from G = (V,E) along with the SDP solution V and a parameter s we 
construct a dictatorship test DICT^. The dictatorship test gadget is exactly the same as 
the construction by Raghavendra [Rag08] for the Max Cut problem. For the sake of 
completeness, we include the details below. 
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DICT^ (Max Bisection) The set of vertices of DICT^ consists of the /?-dimensional 
hypercube {+l) s . The distribution of edges in DICT^ is the one induced by the fol- 
lowing sampling procedure: 

- Sample an edge e = (i>,-, vj) e E in the graph G. 

- Sample R times independently from the distribution p e to obtain z R = 
(zf\ . . .,zf) and z R = (zf, . . .,zf\ both in {±1} R 

- Perturb each coordinate of zf and z R j independently with probability e to obtain 
Z R , Z R respectively. Formally, for each I € [R], 



(z\ with probability 1 - e 

random sample from distribution ju, with probability e 



- Output the edge (zf, Z R ). 
The weights on the vertices of DICTw is given by 



W(x) = E 

ieV 



P [Z = X] 



We will show the following theorem about the completeness and soundness of the 
dictatorship test. 

Theorem 6.4. There exist absolute constants C,K such that for all e, r € [0, 1] there 
exists 5 such that following holds. Given a graph G and a 6-independent SDP solution 
^ - [Vi,o, Vi t i\i € V} for the two round Lasserre SDP for ^ Max Bisection, the dictatorship 
test DICT^r is such that 



- The dictator cuts are bisections with value within 2s of the SDP value, i.e., 
Completeness(D/CT^) ^ val(V) - 2e 

- IfT : {±l) R -> [-1, 1] is a bisection of DICTy (E^CTU)] = 0) and all its 
influences are at most t, i.e., 

Inf£'CF)<T VieV,€e[R], 

then, 

DICr v (T) < opt(G) + Ct Ke . 

Proof. The analysis of the dictatorship test is along the lines of the corresponding proof 
for Max Cut in [Rag08]. 



Completeness. First, the dictatorship test gadget is exactly the same as that con- 
structed for Max Cut in [Rag08]. Therefore from [Rag08], the fraction of edges cut 
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by the dictators is at least val(V) - 2s. To finish the proof of completeness, we need 
to show that the dictator cuts are indeed balanced. However, this is an easy calculation 
since the balance of the f h dictator cut is given by, 

E [x (J) ] = E E [x ij) ] - E E [a] = 0, 

xeW ieV x e/j? ieVaefii 

where the last equality uses the fact that the SDP solution satisfies the balance condition. 

Soundness. Let f : |±1) R — > [-1, 1] be a balanced cut all of whose influences are 
at most t. As in [Rag08], we will use the function T to round the SDP solution V. 
The rounding algorithm is exactly the same as the one in [Rag08]. For the sake of 
completeness, we reproduce the rounding scheme below. 



Rounds Scheme 

Truncation Function. Let /[-i,i] : R — > [-1, 1] be a Lipschitz-continous function 
such that for all x e [-1, 1], /r-^rjOt) = x. Let Co denote the Lipschitz constant of the 
function /[-i,i]. 

Bias. For each vertex i e V, let the bias of vertex i be 0i = (i>j,o, I) and let Wj = 
Vi,o - (Vjfl, I)vj£ be the component of Vifl orthogonal to the vector /. 

Scheme. Sample R vectors ^ \ . . . , ^ with each coordinate being i.i.d normal ran- 
dom variable. 
For each i e V do 

- For all 1 < j < R, compute the projection of the vector Wj as follows: 

gf = e i + [( Wi ,c (i) )\ 

andletgr,-^,...,^) 

- Let F{ denote the multilinear polynomial corresponding to the function T under 

the distribution fif and let H, - Ti_ e Fj. Evaluate H, with as inputs to obtain 

U , (l) (R)\ 

- Round pj to p* € [-1, 1] by using the Lipschitz-continous truncation function 
/[_!,!] : R -» [-1, 1]. 

P*i - f[-i,i](Pi) ■ 

- Assign the vertex i to be 1 with probability (1 + p*)/2 and -1 with the remaining 
probability. 

Let Round^(V) denote the expected value of the cut returned by the rounding 
scheme Rounds on the SDP solution V for the Max Bisection instance G. 
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Again, by appealing to the soundness analysis in [Rag08], we conclude that the 
fraction of edges cut by the resulting partition is lower bounded by 



RouncV(V) > DICT^CF) - C't Ke . 

for an absolute constant C. To finish the proof, we need to argue that if the SDP so- 
lution V is 5-independent, then the resulting partition is close to balanced with high 
probability. 

First, note that the expected balance of the cut is given by, 



E 



- E 

( 



E[/[-U](#to))] 



Fix a vertex i e V. By construction, the random variables z- ~ pj and have 
matching moments up to order two for each i e [R]. Therefore, by applying the invari- 
ance principle of Isaksson and Mossel [IM09] with the smooth function and the 

multilinear polynomial F, yields the following inequality, 

E [/[_!,!] m gi ))] < E [/[_!,!] (tf,(zf ))1 + Ct Ke . 

Since the cut T is balanced we can write, 
E E i](# f (zf))l = E E \Hi(zf)]=B E \f ,(zf )| = E E \T(z?)] = 0. 

In the previous calculation, the first equality uses the fact that = x for 

x € [-1, 1] while the second equality uses the fact that E z [Ti_ £ //,(z)] = E z [F,(z)]. 
Therefore, we get the following bound on the expected value of the balance of the cut, 
E ? [/[-i,i](^«]<Cr & . 

Finally, we will show that the balance of the cut is concentrated around its expecta- 
tion. To this end, we first show the following continuity of the rounding algorithm. 

Lemma 6.5. For each i € V and any vector w' t satisfying Hiu-lb - lltu/lta. ifp\ denotes 
the output of the rounding scheme Rounds with «/ instead of vcn then, 

iie[o?;- p ;) 2 h ^cmwi-wiwj, 

for some function ofR (C(R) = 2 2R suffices). 

Proof. Let g' t = (g. , . . . , g. ') denote the projections of the vector w'. along the 
directions ^ \^ \...,^ R \ The output of the rounding scheme on w. is given by 
p' t = /t-i,i] {Hi{g'^)). Recall that the output of the rounding scheme is given by 
p* = f^iHiigd). 

The result is a consequence of the fact that the function ° //, is Lipschitz 

continous. Since the variance of 'F(zf) is at most 1, the sum of squares of coefficients 
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of Hi is at most 1. Therefore, all the 2 R coefficients of //, are bounded by 1 in absolute 
value. 

The proof is a simple hybrid argument, where we replace g. by g'. one by one. 
The details of the proof are deferred to the full version. □ 

Lemma 6.6. For every i, j, 

\E[p*p*] - B[p*]E[p*]\ < C(R)\(wi,wj)\ 

for some function C(R) ofR (C(R) = 1002 2R suffices). 

Proof Set w'j = wj - {Wi,Wj}-^^ + (Wi,Wj)u for a unit vector u orthogonal to W( 
and Wj. Note that w'. is orthogonal to Wj and satisfies \\w; - w'.\\ < 4|(io,-, wt)\. Let p'. 
denote the output of the rounding with w . instead of wj. Since w'- is orthogonal to wt 
all their projections are independent random variables, which implies that, 

E[p' j p}]=B[p' j ]B[p*]. 

. Moreover, by Lemma 6.5 we have, 

nkPj - P*) 2 ] < C(R)\\ Wj - w'jWj < C(R) ■ I6\(wi,wj)\ 2 . 

. Combining these inequalities and using Cauchy-Schwartz, we finish the proof as fol- 
lows, 

| B[p*p*] - E[p*] E[p*]\ < | B[p*(p* - p'j)]\ + | E[p*] B[p'j - p*]\ 

<2|lE[(^.-^.) 2 ]J 2 (E[(p*) 2 ])^ 
<SC(R)\(wuWj)\ 



□ 



To finish the proof, now we bound the variance of the balance of the cut returned 
using Lemma 6.6. The variance of the balance of the cut returned is given by, 



E(E[/?*]) 2 -(EE[/7*]) 2 = E 



E[p*p*]-mp*m P *] 



4C(R)E[\(wi,Wj)\] 



For a ^-independent SDP solution, the above quantity is at most C(R) poly(5). This gives 
the desired result. □ 
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A Analysis of Cut Value 



We analyze the rounding algorithm in an indirect way - first we show that under certain 
conditions, Algorithm 5.3 returns a better cut compared to Goemans-Williamson algo- 
rithm (in expectation). Then we use an union-bound type argument to give the proof for 
general cases. 

First, we present a bound on the tail of the standard gaussian distribution. 
Lemma A.l. For t > 0, 

O f (0 = 1 - O(0 < 



cm , , yV™' 



t+ aA 2 + 8/7T 

Proof. We apply the following bound on the error function given in [Kom55] 

e xl f e-tdy* ' 

J X 



x + V* 2 + 4/tt 



by replacing x with we get the desired bound. 



From now on, let fi = yjl - A In 2 x 0.7712 and to = <D _1 (/W2 + 1/2) * 1.2034. 

Lemma A.2. Let g(t) - e'~ /2 (\ - n 2 (t)), where y.{t) - 2O(0 - 1. g(t) is decreasing when 
t > to. 

Proof. By simple calculation, we get 

g'{t) = A\te fl \\ - O(0)O(0 + -^=(1 - 20(0) 

we want to show 

te' 2/2 (l - 0(0)0(0 + — =(1 - 20(0) < 
V2^ 

by applying Lemma A. 1, we only need to show 

t+ V? 2 + 8/7r' V " 7 ' V27 

by simplification, we get 



f //2 — > " Qfrt + ——(1 - 20(0) < 



20(0 - 1 > 



V? 2 + 8/tt 

By applying the lemma again and further simplification, we get 



.2 j t> 

e 1 - r > - 
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This can easily be verified for t - to- Also LHS is increasing when t > to, therefore 
the lemma follows. 

□ 

Lemma A.3. Let f\ (x) and fi(x) be twice differentiable decreasing functions defined on 
[0, oo) satisfying the following conditions 

1. /i(0) = / 2 (0) 

2. lim^oo fi(x) = lim^oo f 2 (x) 

3. l im _ || > l 

f'(x) 

4- 7777; = 1 has only one solution 
then 

Mx)<f 2 (x), Vx^O 

Proof. For the sake of contradiction we assume there exists xo such that /i(xo) > fi{xo). 
By the mean value theorem, there exists x\ < xq such that f[(x\) > fUxx), which 
means 4^ < 1 (since both f' x and f 2 are negative). By the fourth assumption, for any 

x > xo > xi, /j(x) > / 2 '( x )' therefore /i(x) - fi(x) > /i(xo) - ^(xo) > 0, contradicting 
the second assumption. □ 

Now we show the key lemma in this section. 

Lemma A.4. Let u = [il + w\ and v - fil + u>2 be two unit vectors with the same 
projection on the direction of I. Also we assume that {w\,W2) - 1 — p > 0, where w\ and 
u>2 are the normalized vectors ofw\ and w-i- Then the probability that these two vectors 
are separated by a random hyperplane is at least the probability that these two vectors 
are cut by Algorithm 5.3. 

Proof. First notice that since u and v have the same bias p, they will be assigned the 
same threshold t = <J> -1 (2p - 1) in Algorithm 5.3. 

Henceforth, we fix (w\ , 1S2) = 1 -p > 0, and express the probabilities as a function of 
p and t. We stress that p and t are fully dependent on each other, therefore the functions 
are only single variable functions. We use both p and t (and other notations that are 
about to be introduced) in the expression only for simplicity. 

Let e = (1 - p 2 )p, which characterizes (u, v) as a function of p, i.e., 

(u, v) - (pi + J I -p?w\), {pi + yjl - p 2 w 2 )) = 1 - e 

Let H(t) be the probability of the two vectors being separated by a random hyperplane. 
It is well-known that [GW95] 

H(t) - arccos(w • v)/jt = arccos(l - s)/n 
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For Algorithm 5.3, notice that w\ ■ g and W2 • g are two jointly distributed standard 

<l l-p) 



Gaussian variables with covariance matrix X = I \ ^ \\. Thus the probability of u 



and v being separated by Algorithm 5.3 is 



B{t) = 2 I I _ e -(*i ^W 2 ) r j j 

It's easy to see that when fj. - t - 0, these two rounding schemes are equivalent, thus 
5(0) = H(0). Also lim^oo B(t) = lim,-**, #(0 = 0. The derivatives of H(t) and 5(f) are 
as follows: 

H \t) = 2 Jif = <h {t )e-< 1 l 2 

^3/2 V2e - e 2 

and 



fl'(f) = -J-®(af)e 
where a = , p < 1 when p < 1 , and 6(0 is defined as 



6(0 = O(0 - O(-0 

Let /(0 = Notice that /(0) = tt/2 > 1, thus by Lemma A.3, we only have to 
show that f(t) - 1 has only one solution. Moreover, it suffices to show that f'(t) < 
when /(0 < 1. 

Notice that when f(t) < 1 , we have 
V2e - e 2 6(a0 < 2 



V2p-p 2 a*W 77 

2e-e 2 4 ~ 6(a0 

=> < — (By convexity of O, — : — > 1 when a < 1) 

2p-p z n 1 ' ' aO(0 

el- £ 4 
=> -o < - 

P 2 - p 7T Z 



9 2-p 4 /e , , 

^(1-p 2 )— £<_ - = 1V 

2 - £ 7T Z \p 



By calculation, one can show that 

,2 



-JTJne- 1 12 V2e - £ 2 / l-£ ~ n „2 V 2 /9 OtoV 



that 



O(0 \2 £ -£ 2 rr 0(0 / 

Now we show fit) < when t > ?o- m order to show this, one only needs to show 

1 - £ ~ O(a0 n „l\fin 

r (2,up)O(a0 + > e {X ~ a ){ 12 a 



2e-e 2 0(0 
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By substituting s = (1 - p?)p and simplification, we get 

'-2p 2 + l-fi 2 \>e 



0(at) 1 / 1 -e 2 , 1 2 \ ^ M - a 2 )t 2 /2 



aO(0 1 -// 2 \2 - e 
Since > 1 when a < 1 and g^" ")' 2 / 2 <g e^'' 2 , it suffices to show 

a<P(f) 

2/^ + 1 vW /2 (l V) 
2 - £ j 

holds when t > t^. 

By Lemma A.2, we know that RHS is decreasing when t > ?o- Now we show LHS 
is increasing when fi ^ po. It can be shown that the derivative of LHS is 

2;up(l - p 2 )p 2 - (2p - V)(2 - e) > -p(2 - 4p 2 )(2 - e) > 

when p> po- 

Now we only have to verify the inequality when t - to, and that can be done numer- 
ically. The calculation shows that LHS(/ ) » 0.8489 while RHS(fo) ~ 0.836. 

□ 



Finally, we show Lemma 5.4. 

Lemma A. 5. (Restatement of Lemma 5.4) Let u = p\l + w\,v = p 2 I + o>2 be two 
unit vectors satisfying \\u - v\\ 2 /A < e, then the probability of them being separated 
by Algorithm 5.3 is at most 0( sfs). 

Proof. (Proof of Lemma 5.4) 

First we prove the case when p\ - p 2 = P- Notice that when (w\,w 2 ) > 0, the 
lemma follows from Lemma A.4 and the fact that Goemans-Williamson algorithm will 
separate u and v with probability 0( V^)[GW95]. 

If (w u w 2 ) < 0, then \\u - v\\ 2 /4 = \\wi - w 2 \\ 2 /4 > drf + \\w 2 \\ 2 )/4 = (1 - p 2 )/2. 
Hence \p\ > 1 - 0( a/s). By union bound, the probability of the algorithm separating u 
and v is at most 0( y/e). 

Now we consider the case when p\ ± p 2 , w.l.o.g. we may assume \p\\ > \p 2 \. We 
construct an auxiliary vector vf as follow: v' = p\l + Jl - p 2 w 2 . It's easy to see that 
\\u - v'\\ < \\u - v\\. Let F denote the rounding function, we analyze the probability of u 
and v being separated as follows: 

P(F(b) t F(d)) 

= P(F(m) * F(v), F(v) = F(v)) + P(F(w) = F(v), F(v) ± F(v)) 
< P(F(«) * F(v')) + P(F(i>') * F(v)) 
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Since \\u - v'\\ < \\u - v\\ and (u,I) - (v',I) - fi\, by the first part of the proof P(F(u) + 
F(v')) < 0( Vi). Also, 

P(F(V) * F(v)) < \ Ml - fi 2 \/2 < ||u - v\\/2 < 0{ yfe) - 

Therefore the lemma follows. □ 



B Mutual Information, Statistical Distance and Indepen- 
dence 

Intuitively, when two random variables have low mutual information, they should be 
close to being independent. In this section we formalize this intuition by giving an ex- 
plicit bound on the statistical distance between the joint distribution and the independent 
distribution. We stress that all the results here are sufficient for our use in this work, but 
we believe the parameters could be further optimized. 

We start by defining a few notions that measures the correlation of two random 
variables. 

Definition B.l. Let Q be a finite sample space, P and Q be two probability distributions 
on Q. The square Hellinger distance of P and Q is defined as 



2 

xeQ. 

Definition B.2. Let Q be a finite sample space, P and Q be two probability distributions 
on Q.. The Kullback-Leibler divergence of P and Q is defined as 



D KL {P\\Q) = Y,P{x)\og 



Now we give a few facts regarding mutual information, Hellinger distance and 
Kullback-Leibler divergence without proving them. 

Fact B.3. Let X and Y be two jointly distributed random variables taking value in [q], 
then 

I{X;Y) = D KL {p{x,y)\\p{x)xp{y)). 

where p(x, y) is the joint distribution ofX and Y on [q] 2 and p(x) X p(y) is the product 
distribution of the marginal distributions ofX and Y. 

Fact B.4. Let Q, be a finite sample space, P and Q be two probability distribution on Q., 
then 

Dkl(Q\\P) > 7^rH 2 (P, Q) 
In 2 

Combining the facts mentioned above, we get the following relation between mutual 
information and statistical distance. 
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Fact B.5. (Restatement of Fact 4.3) Let X and Y be two jointly distributed random vari- 
ables on [q] then, 

I(X; Y) > -L- ( p (* = »• Y = J) ~ p ( x = »W = 7')) 2 , 



21n2 . , 
in particular for all i, j € [q] 

IP(x = j, y = 7) - p(x - op<7 - < V 2/ (*; >0 

As a consequence, ifX and Y are two random variables defined on {-1,1}, Cov(X, Y) < 
Proo/ 

I(X; Y) = D KL (p(x,y)\\p(x)xp(y)) 
2 , 

> —H\p(x, y), p(x) x 
In 2 

= r^- J] (^(.X = i,Y = j)- VP(X = i)P(y = ;)) 2 



= J_ y C P(X = i, F = j) - P(X = QP(F = j) V 



Ufflfft 



Upper bounding ln2 by 1 , finishes the proof. 
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